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The Florida Teacher Certification Examination (FTCE) 
is based upon selected competencies that have been identified as 
minimal entry-level skills for prospective teachers, a description is 
given of the four subtests which make up the FTCE: (1) writing — essay 
on general topics; (2) reading — multiple choice "close* procedure on 
general education passages derived, from textbooks, journals, and 
state publications; (3) mathematics — multiple choice questions on 
basic mathematics, simple computation, and "real world" problems; and 
(4) professional education— mult ipl# choice questions on general 
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presents statistics on the number and percent of students (1981-82) 
passing all subtests, by education major or program. The psychometric 
characteristics of validity, reliability, item discrimination, and 
contrasting group ^performance of the FTCE are discussed. Appendices 
provide information on: (1) essential competencies and subskills 
tested; (2) mathematical illustrations of formulas; (3) security and 
quality control procedures and (4) scoring the writing examination. 
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INTRODUCTION 



Background 

Each applicant for an Initial Florida teacher's certificate oust pass the 
Florida Teacher Certification Examination (FTCE) . The FTCE was established by 
Section 321.17 Florida Jtatutee and is administered by the Florida Department of 
Education* . 

The competencies that form* the basis for the Florida Teacher Certification 
Examination were identified through a study conducted by the Council on Teacher 
Education (COTE) 1 . As a result of the study, twenty-three Essential Generic 
Competencies were established upon which to base the Examination and to form a 
part of the curricular requirements at Florida colleges and universities with 
approved teacher education programs. Later legislative action combined two of 
the competencies,, numbers six and nineteen, and created an additional competency 
dealing with education for exceptional students. . ^ 

/ An ad hoc task force convened by the Department of Education developed 
subsUlls~for~the identified competencies. The Bubskills were* reviewed and 
critiqued by various individuals and organisations including a random sample of 
certified education personnel, statewide professional teacher organizations, and 
all colleges and universities with approved teacher education programs. The 
twenty-three Essential Generic Competencies and the subskllls are listed In - 
Appendix 'A. 

Test item specifications were written for each suhskiiU Specifications 
are rules and parameters for writing test items to measure a particular 
snbskill. They provide information such as the length of the stimuli, the mode 
of the stimuli (graph, problem situation, mathematical algorithms), the 
characteristics of the stem (question, statement completion), the 
characteristics of the correct answer, and the characteristics of the foils. 
The specifications also Include detailed information about the content upon 
which the tests are based. The complete specifications tare contained in the 
Florida Teacher Certification Examination Bulletin lit The General Education 
Subtests — Reading. Writing. Mathematics and itf-the Florida Teacher 
CeHificatton Examinati on Bulletin IIlT^rhe Pr o fessional Education Subtest . 
Copies are available from the. Department for a nominal fee. 

Passing scores for each subtest were recommended by a panel of judges, all of 
whom were either current or past members of COTE and who had been involved In 
* the development of the Examination. - The panel was made up of classroom 



1 COTE was a statutory advisory council appointed by the State Board of 
Education to advise the Commissioner of Education on all matters -dealing with 
teacher education and certification. COTE was replaced by the Florida Education 
Standards Commission in 1980. 



teachers, school administrators, teacher educators, and community representa- 
tives* Passing score recommendations were made to the Commissioner of Ediicatlon 
for each subtest. These recommendations were adopted* as a rule by the StaVe 
Board of Education on July 30, 1980. 

The operational tas! , of preparing test forms, administering the tests, and 
scoring the answer sheets are completed thrmigh an external contract* The 
contract for these tasks was awarded to the University of Florida Office of 
Instructional Resources' for the three administrations 'of the 1981-82 school 
year*. ' 

Periodically, contracts are issued for the development of additional test 
items. New Items are needed to maintain a large pool % of high quality and secure 
test items. A large itesf pool makes it possible to develop alternate foxms of 
the test so that an examinee, who retakes a subtest will receive a new set of 
questions* 

All development is subject to the restrictions of the item 

speci^r ^tons* Test development contractors must provide intensive Item 
reviews a* 1 conduct pilot tests of the items. Following this, the Department 
Invites a panel of college and University educators to review the new items. 
This review consists of a critical reading of each Item for possible bias, 
adequate subject content, and adequate technical quality. After the new .Items 
ha%ve been thoroughly review&i and revised they are field-tested by imbedding 
them In a regular test form /and administering them to a sample of examinees. 
The item difficulties are calibrated with latent trait techniques and equated to 
existing items. Later forms of the FTCE contain the new items. 



Description of the Examination 

The FTCE is administered three times a year at sites throughout Florida. 
The test takes an pntlre Saturday to complete. Examinees usually receive their 
results within one month. Examinees who % fail any part of the FTCE may retake 
that portion at a subsequent administration. The FTCE is'a written test 
composed of four subtests. The characteristics of the four- subtests are 

summarized in Trble 1. ' 

< • 
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TABLE 1 

A Descriptipn of the Four Subtests 
of the 

Florida Teacher Certification Examination 



Subtest 



Writing 



Competency 
Tested 



adlng 



Mathematics 5 



Professional 6, 7, 9-18, 
Education 20-24 



Type of 
Question 

Essay ^ writing 
production; 



Multiple choice 
"cloze" proce- 
dure 



Multiple choice 



Multiple choice 
(problem solving 
application 
level) 



Content 



General topics 



General educa- 
tion passages 
derived fro® 
textbooks, Jour- 
nals, state 
publications 

Basic mathematics: 
simple c imputa- 
tion and "real 
world* problems 

General education 
(personal, social, 
academic develop- 
ment, administra- 
tive skills, excep- 
tional student 
education) 



Scoring 

Holistic 
scoring by 
trained 
experts 

Objective 



Objective 



Objective 



The Writing Subtest is scored holisttcally (general impression marking) by 
three trained judges. The scoring criteria include an assessment of the 
following: 

1. Using language appropriate to # tfte topic and reader 

2. Applying basic mechanics of writing 

3. Applying appropriate sentence structure, * 
4* Applying basic techniques of organisation 

5. Applying standard English usage * 

6. Focusing on the topic 

7. Developing Ideas and covering rtfie topic 

More detailed information- on FTCE administrations is contained in the 
Florida Teacher Certification Examination Registration Bulletin* This booklet 
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is available free from Florida school district offices and f r om the Department 
of Education* 

Pour bulletins have beeta developed to provide information about the 
development of the FTCE. The subtest and item specifications have been 
published in Bulletin Is Overview, Bulletin II: The General Education 
Subtests Reading, Writing, Mathematics , and Bulletin, III: The Professional 
Education Subtest. Bulletin IV: The Technical Manual describes the technical 
adequacy of the first examination* The first three bulletins were distributed to 
all Florida teacher education institutions and school system personnel offices 
.in the fall of 1979. Bulletin IV , designed primarily for measurement 
professionals, was published In 1981. An overview of the coverage of the FTCE 
is provided in Appendix C of Bulletin IV . An annual Technical Report is 
produced to describe the psychometric characteristics of the three tests 
administered during each academic year* This report covers the 1981-1982 
Examinations. *" m ^ * % 

Rasch Calibration of Items ft 

* 

Calibration o\ items is conducted using Rasch methodology and the B1CAL 
computer program. The Rasch model bases the probability of a particular score 
on two parameters, the person's aHlity £.nd the iten's difficulty. The model is 
expressed as: \^ 

p (X vi | B v ,5 i ) « exp rx yl <B y - 6^1 / [1 + exp (B y - 6^3 



in which X , « a score 
vi 

B = person ability * ^ 

6^ « i*t*p difficulty 

» 

Estimates of person ability and item difficulty are obtained using maximum 
likelihood estimation as described in Wright, Mead* and Bell ( BICAL: 
Calibrating Items with the Rasch Model , 1980). 

The process of obtaining item difficulties for new items involves field 
testing experimental items within regularly administered test forms. Multiple 
forms for each administration are comprised of sets of scored items in each form 
and different sets of experimental items. A subset of the scored items forms a 
common' link between forms. The new items are calibrated to the same scale as 
the regular items. All items are then linked to the base scale of November 1980 
by a linking constant. This linking constant is the difference between the 
average calibration values for the common items in November 1980 and their mean 
difficulty in the current administration. A description of this process can be 
found in Ryan ( Item Banking , 1980). 

Following each administration, the data are randomly divided into threr 
sets of 700 candidates each. Candidates are assigned in sequential order to the 
appropriate data set. Calibrations are conducted oh £he data of the candidates 
in each set and the mean difficulty values across the data are calculated for 
each item. 



TEST COMPOSITION, ADMINISTRATION, AND 5 GORING 



Test Creation and Assembly 

The items contained in the Department of Educatiflj| item bank are calibrated 
and equated to the base scale established during the April 1980 field test- The 
Items are given identification codes and detailed intonnatlon on the Item usage 
is maintained Including the identification of the form on which each item wr.s 
used, the difficulty value, item polnt-bi serial correlation, and Rasch fit 
statistics for each item. 

» 

« 

Each test form is designed to ensure that the items (a) fit the item 
specifications for the skill that they were designed to* measure, (b) conform to 
the test specifications in number and type, and (c) represent a range of 
difficulty with a mean difficulty approximating zero loglts. 

A test blueprint is prepared for each form. Items are selected and 
subjected to content, style, and statistical reviews by the Office of 
Instructional Resources at the University pt Florida and by the Florida Department 
of Education. Test items are s Preened for content overlap. 

Placement of the items on the test is primarily a function of appearance 
and content. The order of the items is not related to their difficulty. Items 
are grouped together if they are similar in editorial style,, directions, and 
question stems. 

Experimental items are field-tested within each subtest but are not counted 
In a candidate's score. When multiple test forms are us>d, the core of regular 
(scored) items In each form remains the same for any administration. Test forms 
are spiralled so that each test center receives approximately the same number of 
each form. In this way, all experimental items are field-tested by at least 400 
candidates who represent a cross-section of the people who take the Examination. 

Once the form has been approved, the scoring key is verified. Staff 
members from the Department of Education, the Office of Instructional Resources, 
and three te chers from the public schools take the Examination. These persons 
are also asked to identify any ambiguous items or confusing directions. 

•Camera- ready copy is prepared by a test specialist and a graphic artist. 
Attention is paid to the proper placement of items lo provide workspace where 
necessary. The camera-ready copy is again critiqued by the staff in the 
Department of Education and the Office of Instructional Resources. Corrections 
are made, the copy is Sent to the printer, and a final check of He proof Is 
made before the tests are printed. 



5 i 0 



Administration Procedures 



Examination Date s, Times, and Locations 

The PTCE la administered in the fall, winter, and Rummer of each year. 
Administration dates for 1981-82 were October 31, 1981; February 27, 1982; and 
July 10, 1982. Candidates were permitted to take all four subtests or any 
subtest previously not passed. Thirteen locations in the stat£ were designated 
as testing areas. Specific sites within each area were selected as test 
centers. These centers were selected from the pool of established centers for 
the administration bf standardized examinations. Designated test locations for 
the 1981-82 administrations were: 



1 « Pensacola 

2. Tallahassee 

3. Gainesville 
4- Jacksonville 
5. St. Petersburg 
6 • Tampa 

7. Sarasota < 



8. Miami 

9. Fort Myers 

10. Orlando 

11. Boca Raton 

12. DeLand 

13. Lakeland 



All test centers were inspected to ensure that the rooms met the required 
specifications for lighting, seating capacity, storage facilities, air 
conditioning, and protection from outside disturbances. All facilities were 
able to accommodate handicapped candidates » 

The test schedule is divided Into morning and afternoon sessions. Testing 
time Is fixed but allows adequate time for candidates to complete all sections 
of the Examination. Candidates may continue to the Reading Subtest after they 
finish the Mathematics section. The schedule for each subtest is listed below: 



Writing 
Mathematics 
Reading 
Break 

Professional 
Education 



45 minutes 
70 minutes 
50 minutes 
60 minutes 

150 mi nut as 



9:00 a.m. - 9:45 a.m. 

10:00 a.m. - 11:10 a.m. 

11:10 a.m. - 12:00 noon 

12:00 noon - 1:00 p.m. 

1:30 p.m. - 4:00 p. id. 



A security plan has been developed and implemented for the program. Refer to 
Appendix C for further information about security and quality control. 



Special arrangements are made as necessary fot handicapped candidates* A 
Braille version of the Examination is available. Typewriters or a flexible time 
schedule is permitted for handicapped candidates. 



Test Manuals 



Uniform testing procedures were established for use at all centers 
throughout the state. Documentation of the procedures is available in the Test 
Administration Manual for the. program. The administration manual Includes the 
following topics: 

1. Duties of the Test Center Personnel 

2. Receipt and Security of Test Materials 

3. Admission, Identification and Seating of Candidates 

4. On-site Test Administration Practices and Policies 

5. Timing of Subtest Sections 

b. Instructions for Completing Answer Documents 
7. Special Arrangements for Handicapped Students 

Additional information to candidates is found in several other sources. 
Candidates are notified about Examination requirements, locations, and 
procedures in the Regist ra tion bulletin, Specific directions to candidates 
about the assigned test center and necessary supplies are printed on the 
Admission Ticket. 



Scoring and* Reporting 

Scoring 

The scoring prWess begins with a hand edit of the answer sheets, followed 
with the scanning of an initial set of sheets to verify the accuracy of the 
scanner, the key, and the scoring programs. The remaining sheets are scanned, 
and the data are divided into, sets of 700 candidates. Three data sets are drawn 
using a systematic random sampling method for the calibration of items using the 
Rasch methodology. The items are adjusted to the base scale established by the 
April 1980 field test- A score table of equivalent raw scores to ability logits 
Is calculated and used to determine the ability logits for. the remaining 
candidates. Each person's score in ability logits is then tr ana forme; to a 
scale score with 200 as the minimum passing score. For a discussion of the 
procedures used to establish the cutting score see the technlcsl discussion in 
Bulletin IV . ^ 

Th*s essay is rated by three readers who use a four-point scale defined in 
State Board of Education rules. The resulting scores range from three to twelve 
points. The passing standard is set at six points. Details of the criteria for 
the rating of essays are available in Bulletin II . * 

« 

Re porting 

The reports generated for each administration include a candidate report 
and score interpretation guide, reports for institutions, and state-level 
reports* 



7 12 



Candidate reports indicate whether or not a teat is passed; scaled scores 
are reported only for tests failed, Scores above the passing standard are not 
reported w However, candidates who fall one or more tests are provided their 
scale score for each subtest failed- A detailed my lysis of performance is 
provided to individuals who fall the Prof essiona^ppducat ion Subtest. 

The reports generated for the institutions and the state are listed below: 

!• Number and Percent Passing for: 

a* Each subtest 

b« All four subtests 

c. Three* two, one or no subtests * 

2. Number, Percent Passiftg and Me* . Scores for Each ^ubtest and the 
Total Examination by All Candidates and: 

a. First- time candidates 

b» Re- take candidates 

c. Vocational candidates 

d. Non-v KAtional candidates 

e. Florid* candidates 

f. Non-Fxorida candidates 

g\ Florida candidates from approved degree programs » 

h. Florida candidates from non-approved programs 

i. Sex and ethnic categories 

3. Number and Percent of Candidates by Florida Institutions and by 
Programs, Passing All Subtests and Each Subtest 

4. Number and Percent of Candidates Passing All Subtests and Each 
Subtest by Program Statewide 

5* Frequency Distribution for All Candidates for Each Subtest by Sex 
and Ethnic Category s 

6. Frequency Distribution for Each Subtest for Florida Institution 

Statistical analyses of data are reported in the sections on the 
psychometric characteristics of the Examination. 
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TEST RESULTS FOR 1981-82 



Results for the three test administrations 2 In this report are summa- 
rl&ed in this chapter. The overall passing rates are shown in Table 2. As 
can be seen from the data there are no differences between first -time and 
all candidates for the first two administrations, and two percentage points 
higher pejtfbrmance for first -time takers in February 1982. The February 
1982 administration was the first one that showed a large ef f • - from 
re takers. This effect is not unexpected. * 



Table 2 

Percent of Candidates Passing All Subtests 

of the 

Florida Teacher Certification Examination 
August 1981 - February 1982 





First-Time 
Candidates 


t 


All m 
Candidates 


August 1981 


. 80 




80 


October 1981 


84 




84 


February 1982 


86 




84 

— • — 1 ■ -r - "~ — — — 




0 







Table 3 through 5 on the following pages show the number and percent 
passing each subtest and all subtests for: (&) approved program candidates; 
(b) non^approved program candidates; and Oc) vocational technical candidates. 



2 The August 1981 administration was a part of the data for the 1980-81 
Technical Report . It is also in this report because of a decision to make 
the Technical Report coincide with the same test administrations that are 
used in calculating the eighty percent report for approved programs. The 
eighty percent performance report is calculated from the summer, fall, 
and winter administrations. 



9 

ERIC 



9 

H 



BEST COPY AVAILABLE 



Table 3 

Florida Teacher Certification Examination Number and Percent Passing 
* All Subtests and Each Subtest by Program Statewide for 

Approved Program Candidates 



August 1981; October 1981; and February 1982 Administrations 
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1 
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Table 5 

Florida Teacher Certification Examination Number and Percent Passing 
All Subtests and Each Subtest by Program Statewide for 
Vocational Technical Candidates 



August 1981; October 1981, and February 1982 Administrations 


* 

• 


TOTAL # 


4 PASSED 


% PASSED ~\ 


TOTAL TEST 


422 


179 


42% * 


READING 


369 


263 


712 


MATH* 


379 


273 


, 72% f 


PROF. EDUCATION 


354 


265 


75% 


WRIT INC 


361 


247 


68% 
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Florida Teacher Certification Examination Number and Percent Passing 
All Subtests and Each Subtest by Program Statewide for 
Vocational Technical Candidates - 

August 1981; October 1981; and February 1982 Administrations 

AT 





TOTAL 0 


# PASSED 


% PASSED 


TOTAL TEST" 


422 


179 


42% 


READING 


369 


263 


71% 


HATH 


379 


273 


72% 


PROF.. EDUCATION 


354 


265 


75^ 


WRITING 


361 


247 


68% 
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PSYCHOMETRIC CHARACTERISTICS 



The psychometric characteristics of validity, reliability, item 
discrimination, and contrasting group performsnce of the Florida Teacher 
Certification 'Examination (FTCB) will be addressed la this section- Knowledge 
of the paychonetric characteristics of aasessaent tests is necessary for 
evaluating the teats- ' 



Validity 

Validity refers to the relevance of Inferences that are aade from test 
scores or other fonts of assessment. The validity of a test can be defined as 
the degree to which a test measures what it was intended to measure* Validity 
is not an all-or-aone characteristic, but a matter off degree. Validity is 
needed to ensure the accuracy of information that is inferred from a test score. 

Specific types of validation techniques traditionally used to summarise 
.educational and psychological test use — criterion-related validity 
(predictive and concurrent)., content validity, and construct validity — are 
described in Standards for Educational and Psychological Measurement (APA, 1974, 
pp. 26-31). For the FtCK, the primary validity issue that must be addressed is 
the question of content validity. Content validity demonstrates that test 
behaviors constitute a representative sample of behav iors in a desired 
performance domain. The Intended domain of the FTCK fs that of entry-level 
skills as identified in the statute requiring the Examination as a basis for 
certification. This statute (231.17, F.S.) provides that 

Beginning July 1, 1980 ... each applicant for initial 
certification shall demonstrate, on a comprehensive written 
examination and through such other procedures aa may be 
specif led by the state board, mastery of those minimum 
essential generic and specialisation competencies and other 
criteria as shall be adopted into rules by the state board. 

The statute addresses only the status at certification and does not require 
that inferences be made from test scores to future success as s classroom 
teacher. So claims have been made with regard to measurement of specific 
aptitudes or traits, and no attempt has been made to establish relationships 
between the FTCB and independent concurrent or future criteria- It is only 
claimed that the test adequately measures the skills for which it was developed- 
The construct and criterion-related validation approaches are not appropriate to 
the validity issues related to development and use of the FTCB. 

The content validity of the FTCB ree»s upon the procedures used to describe 
and develop teat items and content areas. The intended coverage of the test una 
determined by a process Involving profession J. consensus to (1) Identify 
competencies which should be demonstrated aa a condition for certification, and 
(2) identify subskills associated with each competency. The procedures by which 
the intended coverage was Identified included surveys of the profession, reviews 



by the Council on Teacher Education (COfE), reviews by the ad hoc COTE task 
force, and reviews by teachers and other professional personnel. 

The general procedures used in test development were as follows: 

1. The Intended test coverage was Identified and explicated* Competencies 
and subskills associated with each competency wre identified and 
validated. 

2. Test item specifications were developed and validated. 

3. Draft Items were written according to test item specifications and 
pilot-tested on a small sample of senior students preparing to be 
teachers. 

4. The final item review consisted of (a) a review by a special 
panel comprised of classroom teachers, teacher educators, and 
administrators, and (b; item field-testing with seniors who 
were in teacher education programs. This was followed by 
another review by Department of Education staff. Items were 
subsequently placed in the item bank for future use. 

5. Field-test data were reviewed by Department of Education staff. Items 
that did not perform well were deleted from the Item bank or revised 
and field-tested again. 

For the final item review process outlined in the fourth step, the items 
were divided by test area and reviewers were divided by area of expertise. The 
process Included a review of item content, group differences in performance, and 
technical quality. Bulletin IV (pp. 13-17) contains further Information about 
the development and review of test items. 

In summary, the validity of the Examination has been welJ established ss a 
result of (1) the extensive Involvement of education professionals in the 
identification and explication of the necessary competencies and their 
associated subskills, (2) the precise item specifications which guided the item 
writers, and (3) the reviews of the items and the competencies/skills that they 
were designed to measure. 

Reliability of Test Scores 

Reliability refers to the consistency between two measures of the same 
performance domain, although reliability does not ensure validity, it limits 
the extent to which a test is valid for a particular purpose. The main 
reliability consideration for the FTCE multiple-choice tests (Reading, 
Mathematics, and Professional Education) is the reliability of an individual's 
score. For the writing test, a production writing sample, the reliability 
consideration is the reliability of the judges' ratings. The data in this 
section refer to the three FTCE administrations between October 1981 and July 
1982. For Information about field test reliability data, refer to Bulletin IV 
(1981). 
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Reliability of Multiple-choice Testa 



A test score is comprised of a "true" score ("domain" score) and an "error" 
score. If an individual took several forms of a test, all constructed by 
sampling from the defined item domain, scores of the various test fores would 
not vary except as a result of random errors associated vith ltea sampling 
errors and changes within an individual fro* one test to another such as 
attention, fatigue, or interest. 

Reliability evidence is generally of two types: (a) internal consistency, 
which is essential if items are viewed as a sample from a relatively homogeneous 
universe; and (b) consistency over time, which is important for tests' that are 
used for repeated measurement. For the FTCK, the primary reliability issue is 
that of internal consistency* Since one form of the test Is administered to 
examinees at each administration, the reliability concern is that of consistency 
of items within that particular test (homogeneity of items). A test can be 
regarded as composed of 'as many parallel tests as the test has items, and every 
Item is treated as parallel to the other Items. In such a case, the appropriate 
reliability Index Is the Ruder-Richardson Formula 20 (KR-20) Index. The KR-20 
formula is shown In Appendix B. 

The KR-20 index estimates the Internal-consistency reliability of a test 
from statistical data on individual items. Separate KR-20 coefficients are 
calculated for the Reading, Mathematics, and Professional Education subtests for 
each FTCE administration. A high coefficient Indicates that a test accurately 
measures some characteristic of persons taking it and means that the individual 
test items are highly correlated. The subtest KR-20 coefficients for the three 
1981-1982 test administrations were above .78, indicating that the individual 
test items were highly consistent measures of the three subject areas assessed. 
Refer to Table 6- for the KR-20 coefficients. 



Table 6 



Kuder-Rlchardson Coefficients 



Professional 
Math Reading Education 

October 1981 .88 .87 .83 

February 1982 .84 .85 .79 

July 1982 .87 , .87 .81 
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Reliability of the passing standards for the objective tests is estimated 
with the Brennan-Kane (B-K) Index of Dependability. This index is an estimate 
of the consistency of test scores in classifying examinees as masters or 
nonmasters of the minimal performance standards. The high B-K coefficients of 
the teats (refer to Table 7) indicate that the candidates* scores are consistent 
with their classification as masters or nonmasters. Befer to Appendix B for the 
statistical formula for the B-K Index. 



TABLE 7 
Brennan-Kane Indices 



October 1981 
February 1982 
July 1982 







Professional 


Math 


Reading 


Education 


.94 


.94 


.96 


.96 


.94 


.96 


.95 


.94 


.95 



Reliability of Scoring of the Writing Subtest 

The major reliability consideration for the Writing test is the 
inter-judge reliability of ratings. The Writing test is a production writing 
sample that addresses one of two specific topics* The essays are rated 
independently by three judges with a referee to reconcile discrepant scores* 
Original reliability data were obtained from a study in <?hich essays were 
written by 360 teacher education students at two universities. Raters were 
trained by the same procedures which are being used in the actual test 
administ rations* The reliability of the scoring process is monitored at the 
University of Florida for each test administration* (Refer to Appendix 0 for 
additional information about the scoring of the Writing teste) 

Two approaches are used to estimate the reliability. First* four indices 
of interpreter agreement are computed. These four indices are: (a) percent 
complete agreement; (b) average percent of two of the three raters agreeing; 
(c) average percent agreement by pairs as to pass/fall; and (d) percent 
complete agreement about pass/fail. The second approach for reliability 
estimation is the calculation of coefficient alpha for the raters and the rating 
team. This coefficient indicates the expected correlation between the ratings 
of the team on this task and those of a hypothetical team of similarly comprised 
and similarly trained raters doing the same task. Field test Interpreter 
reliability data and coef f : :lent alpha for the inter-rater reliabilities are 
reported. in Tables 3*4 and 3.5 of Bulle in IV (pp. 22-23). Refer to Table 8 for 
rater reliability data for the 1981-19 FTCE administrations. 
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TABLE 8 



Percentage of Rater Agreement for FTCE 
Writing Test 





October 
1981 


February 
1982 


July 
1982 


Index 1 - X Complete Agreement 


46.44 


38.55 


38.60 


Index 2 - Average % Two of the Three 
Raters Agreeing 


99.87 


100.00 


99.81 


Index 3 - Average Z Agreenent by 
Pairs as to Pass/Fail 


98.10 


96.75 


96.72 


index ** ~* a uospxere agreement. 
About Paas/Fall 


97.15 


95.12 

* 


95.08 

* 


Topic 1 

Coefficient Aloha 


.84 


.87 


* .82 


Topic 2 


.85 


.85 


.86 



Examination of the reliability data for the Writing test indicates that the' 
level of reliability achieved by the rating teams met acceptable standards for 
such ratings. 

Discrimination 

Item analysis for the FTCE includes examination of the items* capacity to 
differentiate between ability groups and the evaluation of response patterns to 
the individual items. The Item analysis indices used are item difficulty level, 
item discrimination index, and po in t-bi serial correlation coefficients* 

Item difficulty level — the percentage of examinees who answer each item 
correctly — Is calculated for each item. These percentages provide important 
Information because items in the moderate range of difficulty differentiate 
relatively more examinees from each other than do extremely easy or extremely 
difficult items. 

Related to the item difficulty level is the Item discrimination index (see 
page 32) which is the extent to which each item contributes to the total test in 
terms of discriminating between the high and low achievers with regard 
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to the total test score. Any item that is below .20 on this index is evaluated 
for content and ambiguity of wording. Items that appear to be flawed are 
revised or eliminated. The ranges for item difficulty level and corresponding 
item discrimination indices are reported in Table 9. 



The number and percent of examinees who select each alternative response 
(foil) were reported for each item in the multiple-choice tests. This foil 
analysis permits further evaluation of response patterns to the Individual items 
and provides useful information about variations in response performance by 
different groups. These data are provided to the Department of Education staff 
and appropriate subcontractors and are not reported in this document. 

Polnt-bl aerial correlation coefficients indicate the extent to which 
examinees with high test scores tend to answer an Item correctly and those with 
low test scores tend to miss an Item. While the item discrimination index is 
based on the performance of high and low achievers, the point-biaerial 
coefficient Includes the entire range of scores in the correlation, thereby 
Indicating the item-total correlation or the extent to which an item score 
correlates with all other items measuring a particular subject area. 
^Statistical formulas for these indices are listed in Appendix B. 
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TABLE 9 



** frtqutaey of Itm* vithlo Specific 
ItCB Difficulty and Xtoi DUerisinatlon 





.10 *n& 
talon 


.11-. 20 


• 

.21-. 30 


Itot 

,31-. 40" 


Disc rlainet ion Bang. 
.41-. 50 .51-. 60 


.61-. 70 


.71-. 80 


• 

.81-. 90 


.91- 


1.00 


TOTAL 


* » 81-1 .00 


U 


33 


27 


14 


t 


0 


0 


* 

o 


0 


0 


98 


a .61- .80 


0 


0 


3 


4 


12 


7 


2 


0 


0 


0 


28 


5 | ,41- .60 


0 


0 


O 


0 


2 


2 


o 


* 0 


0 


0 


4 


HI .21- .40 


0 


0 


O 


0 


0 


0 


0 


0 


0 


0 


0 


I .0- .20 


0 


0 


0 


0 


o. • 


0 


0 


0 


0 


0 


0 


M TOTAL 


IS 




38 


IS 


* 23 




9 


2 


<•• 


0 




0 


130 




.10 end 
b.lov 


.11-. 20 


.21-. 30 


BHADH3G 1 
Xtsa Dl.criad.nac Ion Range 
.31-. 40 .41-.S0 .51-. 60 


.61-. 70 


.71-. 80 


.81-. 90 


.91- 


1,00 TOTAL 


k .81-1.00 


139 


SO 


19 


6 


0 


0 


0 


0 


0 


0 


214 


t* 

13 .61* .80 


0 


0 


3 


7 


6 


5 


0 


0 


.0 


0 


21 


u 

2 1.41- .60 


1 


0 


0 


1 


1 


2 


0 


0 


0 


0 


S 


S S .21- .40 
S .0- .20 


o 


o 


o 


0 


0 


0 


0 


0 


o 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


~* TOTAL 


140 


50 


22 


14 


7 




7 


0 


0 


0 




0 


240 




.10 and 
below 


.11-. 20 


.21-. 30 


PROFESSIONAL ED0CATIOS 
It sb Disc rlainet loo Range 
.31-. 60 .41-.S0 .51-. 60 


.61-. 70 


.71-. 80 


.81-. 90 


.91- 


1.00 


TOTAL 


t ,8i~1.00 


32 


42 


28 


<2 


0 


0 


0 


0 


0 


0 


104 


S .61- ,»o • 


2 


16 


24 


27 


15 


1 


0 


0 


0 


0 


85 


as 

S * .41- .60 


3 


1 


17 


_v«_ 


4 


1 


0 


0 


0 


0 


44 


°S .21- .40 


1 


1 


2 


i 2 


0 


0 


0 


0 


0 




0 


6 


'I .0- .20 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


TOTAL 


38 


60 


72 


49 ' 


19 




2 


0 


0 


0 




0 


240 
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Means and ranges for the polnt-biserlal correlation coefficients for the 
three 1981-1982 administrations are reported in Tables 10 and 11. 



TABLE 10 . 



Mean Point-Bi serial 
Correlation Coefficients 



» 


nam 


: 7 

Professional ft 
Reading Education \ 


October 1981 


.38 


.32 


.27 

* * 


February 1982 


.37 


.31 


.25 


July 1982 

i 

*> 


.40 


.34 


.25 




TABLE 11 


» 




Point-Bi serial Correlation Coefficients Between 
Correct Item Response and Subtest Score 




Range of 
Point-Bi serial 
Coefficients 


Math 


Reading 


Professional 
Education 


.90-. 99 
.80-. 89 
.70-. 79 
.60-. 69 
.50-. 59 
.40-. 49 
.30-. 39 
.20-. 29 
.10-. 19 
Below .10 


0 
0 
0 
1 
18 
38 
42 
23 
8 
0 


0 
0 
0 
0 
1 
55 
93 
78 
12 
1 


0 
0 
0 
0 
0 
14 
64 
100 
50 
12 


TOTAL ITEMS PflR 1981-82 


130 


240 


240 
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The appropriateness of nean point-blserial correlation coefficients mist be 
evaluated In the context of a particular testing program. According to A 
Reader's Guide to Test Analysis Reports (ETS, 1981), the nean biserial 
correlation will be higher when the essoinee group represents a wide range of 
ability or knowledge or when the test Iteaa are very siallar in content. Since 
the PTCE Reading and Professional Education tests are relatively easy, the 
scores were not greatly different. Thus, variability was reduced, and the 
point-blserlal correlation coefficients were attenuated. 

Contrasting Croup Performance 

■ f 

To the extent that scores on a test reflect group membership rather than \ • 
the knowledge or skill that the test is deaigaed to measure, the test is / 
invalid- Although not all groups necessarily exhibit the same performance level 
in different areas of achievement, the procedure for analyzing contrasting group 
performance is to screen for any specific areas or items. Extensive review 
procedures were used during FTCS development to ensure- that the Examination 
content was an accurate representation of candidate performance in terms of the 
competencies being evaluated. The procedure included (a> a series of reviews 
during tlte item development stage to screen for possibly offensive materials and 
for items that might invalidate examinee performance and (b) statistical • 
analysis of field groups, ethnic groups, and program groups. These procedures 
are described in Bulletin IV (pp. 33-38). 

After each PTCE administration, test content is examined for contrasting 
group performance. Score distributions and summary statistics (including mean, 
.median, and standard deviation of the distribution and an index of skewness) are 
reported for each test. The content review for contrasting group performance 
Includes (a) examination of scatter plots of performance on individual items and 
overall content by sex and ethnic category (male-female, black-white, 
white-hispariic, and hiapanic-black) , (b) analysis of performance by groups based 
on their test scores, and (c) individual item analysis by sex and ethnic 
category to screen for items that may discriminate negatively for a specific 
group. 



Scatterplota 

Scatter diagrams are graphic representations of the extent to which 
performance by two separate gro-ipj is related. Twelve scatterplota are produced 
for each PTCE administration, comparing performance by sex and ethnic category 
for each subtest. Entries that depart from the general pattern indicate that 
one group is performing differently from another group on specific items- In 
such cases, entries that depart substantially from the general pattern of other 
entries are reviewed for content that could account for differences in 
performance level. Items that are determined to be flawed during this review 
are revised or deleted froa>*he item pool. An example of a scatterplot is 
illustrated by Figure 1. 
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Figure 1. Scatterplot of Percent of Examinees Who Answered 
Specific Items Correctly.* 
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Performance for males Is plotted on the vertical axis while 
performance for females Is plotted on the horizontal axis* 
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Subtest Performance by Groups 

The number and percentage of candidates who pass all subtests and 
Individual subtests are reported by sex and ethnic designation after each FTCE 
administration. Table ll displays these data. 
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TABLE 12 

Nusbar and Parent Fusing All Subtaatp and 
Each Subtaat by Total $tndidatas and by Sax and Ethnic Designation ■ 

Flcat-Tlna Candidataa 




All 
Candidataa 





TOT 


H 


X 


TOT 


8 


t 


TOT 


8 


X 


TOT 


8 


X 


TOT 


8 




E8TXRE TEST 


9616 


8236 


86 


2101 


1686 


81 


7513 


6550 


87 


8305 


7590 


91 


609 


330 


41 


H4T8 


9647 


8772 


91 


2110 


1906 


90 


7537 


MM 


91 


6325 


7914 


95 


815 


437 


56 


READING 


9642 


9008 


93 


U09 


1900 


90 


7533 


7108 


94 


8323 


8069 


97 


814 


549 


67 


PBOf ED 


9641 


9287 


96 


2107 


1980 


94 


7534 


7307 


97 


8320 


8231 


99 


815 


618 


76 


WRITING 


9635 


9113 


95 


2107 


1905 


90 


7328 


7210 


96 


8320 


8134 


OB 


.810 

e 


594 


73 



Aaarlcmn Indian/ 



A»lftfi 





TOT 


8 


X 


TOT 


8 


X 


TOT 


8 


X 


TOT 


8 


X 


ehtire test 


348 


200 


57 


16 


14 


88 


24 


13 


54 


214 


89 


78 


WITH 


353 


271 


77 


15 


15 


94 


24 


19 


79 


114 


96 


84 


BSAD1BC 


351 


235 


73 


16 


16 


100 


24 


21 


88 


114 


98 


86 


PROF ED 


352 


299 


85 


16 


15 


94 


24 


21 


88 


114 


103 


90 


WIITING 


351 


248 


71 


16 


16 


100 


24 


19 


79 


114 


104 


91 



■Bunbara in this table rapraaant data Iron tna tnraa FTCE adnlafatratioat (1981-1982) prasaatad in this raport. 
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Item Analysis by Sex and Ethnic Category 

Separate item analyses — including Item difficulty levels. Item 
discrimination indices, point-blserial correlations, foil analyses (alternative 
response choices), and KR-20 estimates of reliability — are reported for each 
sex and ethnic category* The item analysis process includes the screening of 
the individual test item? that may discriminate negatively for a specific group. 
When an outlying entry is Identified on a scatter diagram, the item content is 
carefully reviewed to determine the necessity of deleting or revising the item* 
Foil analyses may also provide useful information with regard to contrasting 
group performance. Variations in response patterns by groups to different foils 
(alternative responses) may indicate the need for item revision* 

The procedures described in this section — including scatter diagrams, the 
analysis of subtest performance by groups, and item analysis by sex and ethnic 
category — • are used to ensure that scores obtained on the FTCE are accurate 
representations of the candidates 9 performance levels in terms of the 
competencies that are addressed and are not a reflection of membership in a 
specific sex or ethnic category. 
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SUMMARY 



The Florida Teacher Certification Examination (PTCK) Is an examination 
based upon selected competencies that have been Identified by Florida educators 
aa minimal entry-level skills for prospective teachers* In order to develop the 
Examination, the following tasks had to be accomplished: (a) planning; 
(b) writing and validation of teat items; (c) field-testing the examination 
items; (d) setting passing scores; and (e) preparing for test assembly, 
administration, and scoring. The competencies (described in Appendix A) have 
teen adopted by the Board of Education aa curricular requirements for teacher 
education programs in the colleges and universities in Florida* 

The FTCB consists of three objective tests (Reading, Mathematics** and 
Professional Education) and an essay test (Writing) that Is scored by trained 
readers* The general test content is as follow: 



Test 



Content 



Writing 
Reading 

Mathematics 



Professional 
Education 



One of two general topics 

General education passages 
derived from textbooks, journals* 
state publications 

Basic mathematics: simple computation, 
and "real world" problems 



General education Including personal, social, 
academic development, administrative 
skills, exceptional student education 



Developmental items are included in the Examination along with regular test items. 
These developmental items are not counted in computing an Individual's score* 

The psychometric characteristics of validity, reliability, item 
discrimination, and contrasting group performance of the FTCB are described in 
this report. The validity of the examination has been veil established aa a 
result of (1) the extensive involvement of education professionals in the 
identlfidation and explicating of the necessary competencies and their 
associated subskills, (2) the precise item specifications which guided the item 
writers, and (3) reviews of the items and the competencies/skills that they were 
designed to measure. The reliability data Indicate that the test items are 
consistent measures of the three subject areas and that the examinees 9 scores 
are consistent with their classification as masters or notx&asters of the minimal 
performance standards* The reliability data for the Writing test demonstrates 
that the scoring by the writing teams meets acceptable standards of consistency. 
Item analyses for the FTCE examine the power of the items to differentiate 
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between ability groups and evaluate response patterns to Individual items. The 
Indices that are used to monitor v £he differences between ability groups are item 
percent correct, item discrimination lndax, and polnt-biserial correlation 
coefficients. Additional item analysis procedures — including scatter 
diagrams, the analysis of subtest performance by groups, and item analysis by 
sex and ethnic category ~ are used. These procedures ensure that scores 
obtained on the FTCE are accurate representations of the candidates' performance 
levels in terms of the competencies that are addressed and are not a reflection 
of membership in a specific sex or ethnic category. 

* The FTCE is administered three times a year in selected locations 
throughout the state. Data from this report indicate that the percentage of 
candidates who passed the entire FTCE for the October 1981, February 1982, and 
July 1982 administrations were 84 percent, 84 percent and 85 percent, 
respectively. Examinees who do not pass all of the tests at one administration - 
may retake the tests not passed at later scheduled testing dates* 
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APPENDIX B 
Mathematics! Illustrations of Formulas 
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The following formulas were used in the calculation of statistics for the 

FTCE: 



(a) Point-Biser iat Correlation 



where r^\ «■ point biserial correlation 
coefficient 



m - m 
r m _s u 

pb a C /pq 3 m s " mean total score of examinees 



answering item right 



a m mean total score of examinees 
u answering item wrong 



standard deviation of total score 
for entire group 

proportion of examinees getting 
item right 

I - P ' 



(b) Ruder-Richardson Formula 20 Reliability Coefficient 



where I ° number of items/questions (any 
omitted questions not included) 



y item right 



KR ?0 ■ I - 1 [ 1 - 0 2 ] p ■ proportion of examinees getting 



q - J - p 

a 2 » variance of the total score 



(c) Standard Error of Measurement 



where O » standard error of measurement 
s 



O m the standard deviation of total 
a « a / 1 ~ r scores 

S X XX 

r ■ the reliability coefficient 
xx * 
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(d) Item Discrimination Index 



u £ 



where R » number of students in high score 
range (i.e.* the upper 27%) who 
answered the item correctly 

R£ - number of students in low score 



Jj(T) range (i.e., the lower 27Z) who 

answered the item correctly 

T ■ total number in the upper and 
lower groups 

(e) Coefficient Alpha 

Coefficient alpha is used as an estimate of the inter-rater reliability 
of Writing test scores. This coefficient Indicates the expected correlation 
between the ratings of the team on this task and those of a hypothetical team 
of tallarly comprised trained raters doing the same task. 

where r^ ■ coefficient of reliability (alpha) 

Vrj 2 

1 k " number of test items 

r kk " v - 1 " C 1 " a 2 1 9 

y I " sum °* tne variances of each item 

a 2 « variance of the examinees 9 total 
y 

test score 



(f ) Brennan-Kane Reliability 



" lc) " • n. - 1 - „2 . „* 



■i (Xpj - cr + s- 



where » number of items 



» grand mean over n p persons and n^ 
items 



s 2 



(Xpj) » sample variance of persons* mean 
scores over items 

that iss SS persons 

* n . 

n i 

P 
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SECURITY 



A security and quality control plan has been developed and implemented fox 
the program* Components of the security plan Include: 

1. Controlled, limited access to all examination materials; 

2. Shredding of developmental materials and used booklets; 

3. Strict accounting of all materials ami of persons working with 
test items at the testing agency and test centers. 

A signed security document is obtained from every individual who has access 
to the examination materials • The security document contains an agreement that 
the Individual will not reveal -i in any manner to any other individuals — the 
examination items, paraphrases orsfhe examination Items* or close approximations 
to the examination items. Only persons who have a "need to see" the items 
because of their work on the project are allowed to view any parts of the 
examination* ^ 



Test Security During the Administration 

During the production phases of this project all typing and reproduction are 
done by persons who have security clearances* All materials are signed out when 
they are removed from locked storage and checked in when they are returned* One 
person Is assigned responsibility for the secure files while all work is in 
process; this person is able to account for all materials, at all times. 
Material that needs to be revised and unusable materials are not placed in 
wastebaskets but are kept in a locked file for special destruction* 

The following plan has been implemented to ensure rigorous security of all 
materials during actual examination administration* Materials remain in secure 
storage at the test centers until the morning of the test date* If multiple 
rooms are used at a center, each room Is assigned blocks of materials that mist 
be signed foi by a room supervisor, the only person who has access to the room 
supply* Test books and materials are never left unguarded* Candidates are 
assigned seats by center personnel. The seating arrangements minimise the 
possibility of a candidate seeing the papers of other candidates. Books are 
distributed by the room supervisor and proctors. Bach booklet Is handed to the 
examinees individually and the examinees sign a receipt for the booklets by 
serial number* Immediately after distribution, an inventory Is taken to ensure 
that the sum of the distributed and unused books equals the number of books 
assigned to the testing room* Any discrepancy is reported to the center 
supervisor and Immediate steps are taken to reconcile the discrepancy and locate 
the missing material. Every such incident is reported to the Project Manager, 
and appropriate action is instituted to prevent further occurrences and to 
recover any missing materials * 

Candidates cannot leave the room during a test session except for an 
emergency* If a candidate must leave the room, materials are delivered to the 
room supervisor or pr ictor and held until the candidate's return. No materials 
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nay be renewed from the test room at any time. Only one candidate nay leave ^ie 
room at a tine. Provision of a break between subtests reduces the need for \ 
candidates to leave during a test session. At the end of a session all test 
books are collected and accounted for before collection of the answer documents. 
After the answer documents are accounted for, all candidates In the rooa are 
dismissed. Upon return to their original seats, candidates are reidentified by 
test administration personnel before the distribution of materials for the next 
subtest. During breaks and the lunch period, all materials are either locked in 
secure storage or are placed under direct supervision of test administration 
personnel. All used and unused materials are returned to locked storage 
Immediately after test administration. 



Quality Control 

To ensure quality control during the scoring and reporting process, the 
following procedures are used: 

1. Each answer document is checked for proper coding and marking In 
response areas; 

2. Computer edit- programs are used to check for valid program codes 

on the registration forms and for matching names and social security 
numbers on the registration and scoring files; 

3. Test data are used to verify the accuracy of all scoring and reporting 
programs; 

4. Sample data are drawn prior to scoring from each administration to 
screen for key, printing, or procedural errors; 

5. Random answer documents are hand-scored during the scanning process to 
verify proper operation of the scanner; 

6. A complete review of all procedures — which includes hand-checking a 
sample of test data — is completed by members of the University of 
Florida, Office of Instructloaal Resources and the Department of 
Education before printing the candidate score reports; 

7. Analyses of the holistic scoring process are conducted. This review 
addresses the overall reliability of the ratings, the distribution of 
scores, and number of refereed scores for each reader. Specific 
procedures for quality control during the holistic scoring process are 
documented in the Procedural Manual for Holistic Scoring; 

8. The accuracy of the calculations for the institutional and state 
reports are hand-verified. 
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HOLISTIC SCORING OF THE WRITING SUBTEST 



OF THE FLORIDA TEACHER CERTIFICATION EXAMINATION 



The Writing Subtest 

The Writing Subtest was designed to assess a candidate's ability to write 
in a logical, easily understood style with appropriate grammar and sentence 
structure. The subskills to be measured are: 

a. Uses language at the level appropriate to the topic and reader; 

b. Comprehends and applies basic mechanics of writing: spelling, 
capitalisation, and punctuation; 

c. Comprehends and applies appropriate sentence structure; 

d. Comprehends and applies basic techniques for the organization of 
written material; 

e. Comprehends and applies standard English usage in written 
communication . 

The candidate is given a choice between two topics on which to write an 
essay during the 45-minute examination period. This essay should demonstrate 
the competency and subskills specified above. The essay or writing sample is 
scored holistically by at least three trained and experienced judges. 



The Process of Holistic Scoring 

Holistic Scoring Defined 

Holistic scoring or evaluation is a process for Judging the quality of 
writing samples. It has been used for many years by professional testing 
agencies for credit-by-examlnation, state assessment and teacher cert if Icatloa 
programs . 

Essays are scored holistically, that Is for the total, overall Impression 
they make on the reader, rather than for an analysis of specific features of a 
piece of writing. Holistic scoring assumes the skills which make up the ability 
to write are closely interrelated and that one skill cannot be separated from 
the others. Thus, the writing is viewed as a total work In which the whole is 
something more than the sura of the parts. A reader reads a writing sample 
quickly, once. He or she obtains an impression of its overall quality and then 
assigns a numerical rating to the paper based on judgments of how well it meets 
a particular set of established standards. 



The Reader 

The key to effectiveness of the holistic scoring process is the readers who 
must make valid and reliable judgments. Readers must bring to the process 
experience in teaching and grading English compositions. In addition, they must 
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be willing to undergo training in holistic scoring which demands they set aside 
personal standards for judging the quality of a writing sample and adhere to 
standards which have been set for the examination. The goal for the reading of 
the Writing Subtest of the Florida Teacher Certification Examination is to rate 
a large number of essays according to their overall competence in a consistent 
or reliable manner according to previously established standards based on a set 
of defined criteria. By undergoing a set of training procedures a group of 
experienced teachers of composition can develop a high level of consistency in 
making Judgments about the quality of a group of essays. 



The Criteria 

The criteria established to score the essays for the Florida Teacher 
Certification Examination are listed below. They were developed to accommodate 
specific conditions imposed by the Writing Subtest: 

(1) They reflect those characteristics widely accepted as indicative of 
good writing; 

(2) They can be translated into operational descriptions of levels of 
competence; » 

(3) They reflect the general competency statement and subskills 
identified by the Council on Teacher Education. 



Specific Criteria for Evaluation of Essays 

1. Rhetorical Quality 

1.1 Unity: An ordering and interdependence of parts producing a single 
effect: completeness. 

1.2 Focus: Concentration of a topic; the presence of a "center of 
gravity." 

1.3 Clarity: Lucidity of expression; lack of ambiguity and distortion. 

1.4 Sufficiency: Appropriate depth and breadth or expression to meet 
the writer's purposes and the demands of the particular topic. 

2. Structural and Mechanical Quality 

2.1 Organization: Consistent and coherent integration and connection of 
parts. 

2.2 Development: Appropriate and sufficient exposition of ideas; use of 
detail, examples, illustration, comparisons, etc 

2.3 Paragraph and Sentence Structure: Appropriate form, variety, logic, 
relatedness of and among structural units. 

2.4 Syntax: Appropriate ordering of words to convey intended meaning. 



3. Observance of Conventions In Writing 

3.1 Usage: Appropriate use of. language features: inflections, tense, 
agreement, pronouns, modifiers , vocabulary, level of discourse, etc* 

3.2 Spelling, Capitalization, Punctuation: Consistent practice of 
accepted forms* 

The relationship between the subskllls and the scoring criteria is . 
illustrated in the figure below* 



RHETORICAL 



STRUCTURAL 



CONVENTIONAL 



ESSENTIAL COMPETENCIES: 
Demonstrate the ability to 
write in a logical, easily 
understood style with 
appropriate grammar and 
sentence structure. 

a. Use language appropriate 
to the topic and reader. 

b. Apply basic mechanics of 
writing. 

c. Apply appropriate sen- 
tence structure. 

dp Apply basic techniques for 
organization. 

e. Apply standard English 
usage. 
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Operational Descriptions 

The operational descriptions based on the scoring criteria reflect the four 
levels of competency which the readers are to assign each of the essays they 
read. Each reader will independently score or rate a paper on a scale of 1 to 
4» with 4 being the highest rating. The descriptions which follow are an 
attempt to express clearly and precisely the general, overall impressions a 
reader has in terms of the criteria when he or she reads essays of varying 
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quality. The four levels or quality of competence could be expanded or 
decreased. However, for the task of scoring the writing Subtest, it provides 
enough degrees of distinction to be meaningful yet manageable for large 
scale testing. 

4. The essay is unified, sharply focussed, and distinctively effective, 
it treats the topic clearly, completely, and in suitable depth and 
breadth. It Is clearly and fully organised, and it develops ideas with 
consistent appropriateness and thoroughness. The essay reveals an 
unquestionably firm command of paragraph and sentence structure. 
Syntactically, it is smooth and often elegant. Usage is uniformly 
sensible, accurate, and sure. There are very few, If any, errors 
in spelling, capitalization, and punctuation. 

3. The essay is focussed and unified, and it is clearly If not 

distinctively written. It gives the topic an adequate though not 
always thorough treatment. The essay is well organised, and much of 
the time it develops ideas appropriately and sufficiently. It shows a 
good grasp of paragraph and sentence structure, and Its usage is 
generally accurate and sensible. Syntactically, it is clear and 
reliable. There may be a few errors in spelling, capitalization, and 
punctuation, but they are not serious. 

2. The essay has some degree of unity and focus, but each could be 

Improved. It is reasonably clear, though not Invariably so, and it 
treats the topic with a marginal degree of sufficiency. The essay 
reflects some concern for organization and for some development of 
ideas, but neither is necessarily consistent nor fully realized. 
The essay reveals some sense, if not full command, of paragraph and 
sentence structure. It is syntactically bland and, at times, 
awkward. Usage is generally accurate, if not consistently so. There 
are some errors in spelling, capitalization, and punctuation that 
detract from the essay's effect if not from its sense. 

1. The essay lacks unity and focus. It is distorted and/or ambiguous, 
and it fails to treat the topic in sufficient depth and breadth. 
There is little or no discernible organization and only scant 
development of ideas, if any at all. The essay betrays only 
sporadically a sense of paragraph and sentence structure, and it is 
syntactically slipshod. Usage is irregular and often questionable or 
wrong. There are serious errors in spelling, capitalization, and 
punctuaticta. 



Training of Readers 

The training of readers for the Writing Subtest ot the Florida Teachers 
Certification Examination consists of three steps; 



45 55 



Acquiring Information about the examination and holistic scoring 
process. 

Reading and scoring essays which have been selected as good examples 
of the various levels of competence In writing. The practice essays 
have been scored by ' experienced readers and annotated In accordance 
with the operational descriptions. By reading, scoring and 
discussing the essays, the readers practice until they consistently 
give the same ratings to essays as the experienced readers. 

Reading and scoring a sample of the actual Writing Subtests which 
have been selected and scored prior to the training session. These 
samples will serve as the standards for the scoring of the 
examination and will include essays which represent each of the 
competency levels. As in Step 2, the emphasis will be on each reader 
to assign scores which agree with those established ear.lier by the . 
experienced readers. This step occurs immediately before the actual 
scoring session and often is repeated during the session to ensure 
continued consistency or reliability of assigned scores or ratings. 



Setting the Standards 

Prior to Step 3 in the training, standards for the Writing Subtest are 
established. The Chief Reader, who is responsible for conducting the holistic 
scoring, and his assistants, the Assistant Chief Reader and the Table Leaders, 
select, at random, a sample of papers from the total group of essays written on 
a particular topic. These papers are read and scored Independently by each 
person. Results are compared and consensus is reached for the Identification of 
four papers. Each becomes .a standard for one of the four competency levels. 
Additional papers are chosen to be used in Step 3 of the training procedures. 
This process is repeated for the second topic of the Writing Subtest. 



The Scoring Session 

The scoring session begins immediately after Step 3. Readers ire assigned 
to tables in groups of four or five. The number of readers and the number of 
tables are determined by the number of essays to be scored. Each table of 
readers is also assigned a Table Leader. The Table Leader's primary t^sk is to 
continually monitor the scoring process and consult with readers as questions or 
"problem" papers arise. The Table Leader is an experienced reader who has 
helped set the standards. 

Each reader is given a set of papers to read, rate and mark the score. The 
identity of the writer is not known to the reader. The papers range, on the 
average, from 200 to 400 words in length, and each can be read and scored 
hollstically in approximately two minutes. As the scoring' of a set of papers is 
completed by a reader, a clerk collects and returns the paper to «n operation 
table. The scores given by the reader are covered, and the papers are 
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redistributed to another set of folders and delivered by the clerk to a second 
table of readers. This procedure continues until each paper has been read by 
three different readers. Each reader reads, judges and scores st his own pace. 
Scoring sessions are approximately three hours long, with ten minute breaks each 
hour. Usually there are two scoring sessions for each day of holistic reading. 

After a paper has been scored by three different readers, the scores are 
examined at the operations table. If one of the scores varies from any other by 
two levels or more (ex. 3-3-1), the paper is sent. to the Chief Reader or 
Assistant Chief Reader who serves as referee. This' person assigns a rating 
which replaces the discrepant score. Papers whose original ratings are 1-2-3 or 
2-3-4 are refereed and scored as follows: 
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All initial scores of 5 will be refereed. If any paper is refereed and a 
discrepancy still occurs, the essay Is submitted to a new team of readers until 
.consistency is obtained. 

The three scores are then added together for a total score. Thus the 
lowest score possible is a 3, the highest, 12. 

Final Steps 

After the reading sessions are completed, Table Leaders evaluate the 
performance of Readers. The Chief Reader evaluates the Table Leaders. Readers 
are asked for comments and suggestions for Improving training and scoring 
procedures. 

Two approaches for reliability estimation are the percentage of rater 
agreement and the calculation of coefficient alpha for the raters and the rating 
team, which indicates the expected correlation between the ratings of the team 
and those of a hypothetical team of similarly comprised and similarly trained 
raters doing the same task. The four Indices that represent rater agreement 
are: (a) percent complete agreement; (b) average percent* of two of the three 
raters agreeing; (c) average percent agreement by pairs as to pass/fail; and 
(d) percent complete agreement about pass/fail.' These data are reported in 
Table 8 of this report. 
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